Introduction

In this study I look at data from the 2016 Presidential election by candidate, total contributions, location of, and number of contributors. For each of these variables I try to parse the data and create some visualizations that sum up the data well. I end up creating a statististical summary with bar and line plots, overview of the data by plotting, on a world map, some of the aforementioned varibles, and a general box plot to differentiate between gender and party in the election.

setwd("C:/USers/Zohaib/Desktop/Lectures/Udacity/R") #Setting the correct working directory.
Contributions<-read.csv("ContributionsEdited.csv",header = TRUE,na.strings="",encoding="UTF-8") #Reading in the data.
library(plyr) #Setting up the libraries for the following code.
library(ggplot2)
library(gridExtra)
library(maps)
library(ggmap)
library(devtools)
library(dplyr)
for (i in c("cand_nm","election_tp")){print(sort(table(Contributions[i]),decreasing=TRUE))} # Gives table counts for 
## 
##   Clinton, Hillary Rodham          Sanders, Bernard 
##                   3506081                   2063097 
##          Trump, Donald J. Cruz, Rafael Edward 'Ted' 
##                    782711                    557581 
##       Carson, Benjamin S.              Rubio, Marco 
##                    248227                    104814 
##                Paul, Rand                 Bush, Jeb 
##                     32485                     29459 
##            Fiorina, Carly           Kasich, John R. 
##                     27793                     25498 
##             Johnson, Gary               Stein, Jill 
##                     13429                     11020 
##             Walker, Scott            Huckabee, Mike 
##                      7338                      6460 
##  Christie, Christopher J.   O'Malley, Martin Joseph 
##                      5998                      5317 
##        Graham, Lindsey O.            McMullin, Evan 
##                      4397                      2540 
##      Santorum, Richard J.          Lessig, Lawrence 
##                      1723                      1339 
##    Perry, James R. (Rick)     Webb, James Henry Jr. 
##                       908                       800 
##             Jindal, Bobby         Pataki, George E. 
##                       765                       344 
##      Gilmore, James S III 
##                        88 
## 
##   P2016   G2016   O2016   P2020   P2012   P2018   G2015   P2015   G2106 
## 4789297 2634940    1891      76       3       3       2       2       1 
##   P2019 
##       1
                                                                                            # each candidate and 
                                                                                            # election.
Contributions$names<-lapply(as.character(Contributions$cand_nm),function(x) strsplit(x,",")[[1]][1]) 
Contributions$names<-factor(Contributions$names,levels=sort(as.vector(as.character(unique((Contributions$names)))))) #Creates and easier to use name variable, with just last names.

As expected, the most number of the contributions are going to the popular choice, Clinton, the fundraising upset Sanders, and election winner Trump. For the rest of the individuals, one can get a sense of how the election turned out just by looking at the differing number of contributions.

plotCont<-subset(Contributions,election_tp=="P2016"|election_tp=="G2016") #Creates data set to focus on 2016 election.
p1<-ggplot(aes(x=names,y=contb_receipt_amt,group=1),
       data=plotCont)+
  geom_bar(aes(fill='red'),stat="summary",fun.y=mean)+ #bar plot for mean of contbributions by candidate
  facet_wrap(~election_tp,nrow = 2)+                   #split into the primary and general elections.
  geom_point(alpha=.5,size=.75,stat="summary",fun.y=median)+
  geom_line(color='purple',stat="summary",fun.y=median)+ #line plot for median contributions.
  labs(x="Candidate Names",y="Mean Contribution Amount ($)")+
  theme(plot.title = element_text(size=22))+
  guides(fill=FALSE)
p2<-ggplot(aes(x=names,y=contb_receipt_amt/1000000,group=1),
           data=plotCont)+
  ylab("Contribution Total ($ mil)")+          #line plot for sum of contributions by candidate split by primary and
  ggtitle("Contribution by Candidate")+        #general elections.
  geom_line(size=1.25,color='steelblue',stat="summary",fun.y=sum)+
  theme(axis.title.x=element_blank())+
  facet_wrap(~election_tp,nrow=2)+
  guides(fill=FALSE)
grid.arrange(p2,p1)

From this one can notice something odd about the mean contributions as some of the individuals you would expect to have high mean and/or median contributions do not! Clinton, Sanders, and Trump all have low amounts in comparison.The three, on the other hand, do obviously have the highest total contributions, but one should find it very odd that some of the prominent candidates had low statistics, while someone like Jindal had or Lessig both had very high amounts. Then there are the negative means (most likely due to reimbursements), and contributions in the general election for people that were not even in the race such as Lessig who has a high mean amount of contributions then. Yet, when one looks at the count for Lessign he only has 1339 contributions so these numbers are a little less concerning.

ggplot(aes(x=contbr_occupation,y=contb_receipt_amt),
       data=plotCont)+
  geom_point(stat="summary",fun.y=mean)

ggplot(aes(x=contbr_city,y=contb_receipt_amt), #Two plots trying to create scatter plots by city and occupation versus
       data=plotCont)+                         #receipt amounts, but this obviously does not seem like the best plots.
  geom_point(stat="summary",fun.y=mean)

These plots do not really say anything, and it is near impossible to do anything worth-while with them.

Contributions$Gender<-NA
Contributions$Party<-NA        
Males=c("Rubio","Santorum","Perry","Carson","Cruz","Paul","Sanders","Huckabee",
        "Pataki","O'Malley","Graham","Bush","Trump","Jindal","Christie",
        "Walker","Webb","Kasich","Gilmore","Lessig","Johnson","McMullin")
Females=c("Clinton","Fiorina","Stein")
Republicans=c("Rubio","Santorum","Perry","Carson","Cruz","Fiorina","Paul","Huckabee",
              "Pataki","O'Malley","Graham","Bush","Trump","Jindal","Christie",
              "Walker","Kasich","Gilmore")
Democrats=c("Clinton","Sanders","Webb","Lessig")
Others=c("Stein","Johnson","McMullin")    #Creating Gender and Party variables based off of a list of the names that 
                                          #fall in each category.
for (i in Contributions$names){
  if (i %in% Males) {
    Contributions$Gender="M"
  }
  else if (i %in% Females) {
    Contributions$Gender="F"
  }
  else {
    Contributions$Gender=NA}}
for (i in Contributions$names){
  if (i %in% Republicans){
    Contributions$Party="R"
  }
  else if (i %in% Democrats) {
    Contributions$Party="D"
  }
  else if (i %in% Others){
    Contributions$Party="O"
  }
  else {
    Contributions$Party=NA
  }}
for (i in unique(Contributions$gender)){
  print(c(i,quantile(x=Contributions[Contributions$gender==i,]$contb_receipt_amt,probs = .85)))
  } #calculates 85th percentile for contributions by gender.
##         85% 
##   "M" "100" 
##         85% 
##   "F" "100"
for (i in unique(Contributions$party)){
  print(c(i,quantile(Contributions[Contributions$party==i,]$contb_receipt_amt,probs = .85)))
} #calculates 85th percentile for contributions by party.
##         85% 
##   "R" "250" 
##         85% 
##   "D" "100" 
##         85% 
##   "O" "500"

This data is just to get a sense of the data for the following plots, which need scaling manipulations.

ggplot(aes(x=gender,y=contb_receipt_amt),data=plotCont)+
  geom_boxplot(aes(alpha=.1))+
  coord_cartesian(ylim=c(quantile(Contributions[Contributions$gender=="F",]$contb_receipt_amt,probs = .25),
                         quantile(Contributions[Contributions$gender=="F",]$contb_receipt_amt,probs = .95)))+
  guides(fill=FALSE)

ggplot(aes(x=party,y=contb_receipt_amt),data=plotCont)+
  geom_boxplot(aes(alpha=.1))+
  coord_cartesian(ylim=c(quantile(Contributions[Contributions$party=="R",]$contb_receipt_amt,probs=.25),
                         quantile(Contributions[Contributions$party=="R",]$contb_receipt_amt,probs=.95)))+
  guides(fill=FALSE) #plotting boxplots for both contributions by gender and party with the 25th and 95th percentiles

                     #used to give a better picture of the plots.

Here differences between gender and party are shown in terms of contributions, although, oddly the “Other” party types have a higher median amount of contributions when compared to Republicans and Democrats, although both of the latter have very high outliers throughout the plots. This is probably due to the lower count of contributions that was donated to the Other parties.

namelist<-unique(Contributions$names)
for (i in namelist){
  assign(i,subset(plotCont,names==i)%>% #Assigns each candidate name to a dataframe of cities and counts.
    group_by(names,contbr_city,contbr_st) %>% #Grouping by city and state to get counts for each for each candidate.
    summarize(n=n()) %>%
    ungroup() %>%
    ungroup())
  i<-arrange(get(i),desc(n))
  print(head(i,n = 10)) #printing tables of the top ten cities in terms of count of contributions from there.
}
## # A tibble: 10 x 4
##     names contbr_city contbr_st     n
##    <fctr>      <fctr>    <fctr> <int>
##  1  Rubio       MIAMI        FL  2258
##  2  Rubio    NEW YORK        NY  1538
##  3  Rubio     HOUSTON        TX  1349
##  4  Rubio      NAPLES        FL  1258
##  5  Rubio      DALLAS        TX  1214
##  6  Rubio LOS ANGELES        CA   874
##  7  Rubio  WASHINGTON        DC   858
##  8  Rubio     ATLANTA        GA   679
##  9  Rubio     CHICAGO        IL   645
## 10  Rubio  BOCA RATON        FL   608
## # A tibble: 10 x 4
##       names       contbr_city contbr_st     n
##      <fctr>            <fctr>    <fctr> <int>
##  1 Santorum            DALLAS        TX    59
##  2 Santorum        PITTSBURGH        PA    35
##  3 Santorum     OVERLAND PARK        KS    33
##  4 Santorum          MCKINNEY        TX    31
##  5 Santorum         RICHFIELD        OH    29
##  6 Santorum         LAFAYETTE        LA    25
##  7 Santorum       GREAT FALLS        VA    21
##  8 Santorum         PALO ALTO        CA    20
##  9 Santorum            DARIEN        CT    19
## 10 Santorum PANAMA CITY BEACH        FL    19
## # A tibble: 10 x 4
##     names     contbr_city contbr_st     n
##    <fctr>          <fctr>    <fctr> <int>
##  1  Perry         HOUSTON        TX    69
##  2  Perry          DALLAS        TX    64
##  3  Perry          AUSTIN        TX    58
##  4  Perry         LUBBOCK        TX    38
##  5  Perry     SAN ANTONIO        TX    36
##  6  Perry      FORT WORTH        TX    23
##  7  Perry         MIDLAND        TX    18
##  8  Perry     BAKERSFIELD        CA    17
##  9  Perry FORT LAUDERDALE        FL    12
## 10  Perry         MISSION        TX    12
## # A tibble: 10 x 4
##     names      contbr_city contbr_st     n
##    <fctr>           <fctr>    <fctr> <int>
##  1 Carson          HOUSTON        TX  1486
##  2 Carson      SAN ANTONIO        TX  1240
##  3 Carson           DALLAS        TX  1130
##  4 Carson COLORADO SPRINGS        CO  1047
##  5 Carson          PHOENIX        AZ  1021
##  6 Carson           TUCSON        AZ   885
##  7 Carson        LAS VEGAS        NV   866
##  8 Carson        SAN DIEGO        CA   801
##  9 Carson        CHARLOTTE        NC   775
## 10 Carson       CINCINNATI        OH   686
## # A tibble: 10 x 4
##     names contbr_city contbr_st     n
##    <fctr>      <fctr>    <fctr> <int>
##  1   Cruz     HOUSTON        TX 14933
##  2   Cruz SAN ANTONIO        TX  5089
##  3   Cruz      DALLAS        TX  5014
##  4   Cruz      AUSTIN        TX  4044
##  5   Cruz  FORT WORTH        TX  3660
##  6   Cruz      SPRING        TX  3323
##  7   Cruz   SAN DIEGO        CA  1981
##  8   Cruz        KATY        TX  1967
##  9   Cruz   LAS VEGAS        NV  1897
## 10   Cruz   ARLINGTON        TX  1895
## # A tibble: 10 x 4
##     names   contbr_city contbr_st     n
##    <fctr>        <fctr>    <fctr> <int>
##  1   Paul       HOUSTON        TX   365
##  2   Paul        AUSTIN        TX   312
##  3   Paul      SAN JOSE        CA   294
##  4   Paul     LAS VEGAS        NV   196
##  5   Paul       SEATTLE        WA   187
##  6   Paul        DALLAS        TX   181
##  7   Paul   SAN ANTONIO        TX   179
##  8   Paul      NEW YORK        NY   175
##  9   Paul    LOUISVILLE        KY   165
## 10   Paul BOWLING GREEN        KY   149
## # A tibble: 10 x 4
##      names   contbr_city contbr_st      n
##     <fctr>        <fctr>    <fctr>  <int>
##  1 Clinton      NEW YORK        NY 155804
##  2 Clinton    WASHINGTON        DC  71610
##  3 Clinton   LOS ANGELES        CA  65166
##  4 Clinton SAN FRANCISCO        CA  56833
##  5 Clinton      BROOKLYN        NY  56711
##  6 Clinton       CHICAGO        IL  46440
##  7 Clinton       SEATTLE        WA  44346
##  8 Clinton       HOUSTON        TX  33076
##  9 Clinton        AUSTIN        TX  30261
## 10 Clinton      PORTLAND        OR  27821
## # A tibble: 10 x 4
##      names     contbr_city contbr_st     n
##     <fctr>          <fctr>    <fctr> <int>
##  1 Sanders        NEW YORK        NY 40318
##  2 Sanders         SEATTLE        WA 35006
##  3 Sanders     LOS ANGELES        CA 31840
##  4 Sanders   SAN FRANCISCO        CA 31077
##  5 Sanders        PORTLAND        OR 29074
##  6 Sanders        BROOKLYN        NY 26807
##  7 Sanders         CHICAGO        IL 24410
##  8 Sanders          AUSTIN        TX 16387
##  9 Sanders       SAN DIEGO        CA 14864
## 10 Sanders WEST SOMERVILLE        MA 13562
## # A tibble: 10 x 4
##      names contbr_city contbr_st     n
##     <fctr>      <fctr>    <fctr> <int>
##  1 Fiorina     HOUSTON        TX   361
##  2 Fiorina      DALLAS        TX   291
##  3 Fiorina    NEW YORK        NY   279
##  4 Fiorina   SAN DIEGO        CA   219
##  5 Fiorina     ATLANTA        GA   196
##  6 Fiorina      AUSTIN        TX   196
##  7 Fiorina  SCOTTSDALE        AZ   171
##  8 Fiorina  ALEXANDRIA        VA   161
##  9 Fiorina LOS ANGELES        CA   149
## 10 Fiorina  MANDEVILLE        LA   146
## # A tibble: 10 x 4
##       names contbr_city contbr_st     n
##      <fctr>      <fctr>    <fctr> <int>
##  1 Huckabee LITTLE ROCK        AR   114
##  2 Huckabee   TEXARKANA        AR    75
##  3 Huckabee    AMARILLO        TX    60
##  4 Huckabee OREGON CITY        OR    56
##  5 Huckabee   TEXARKANA        TX    55
##  6 Huckabee     HOUSTON        TX    51
##  7 Huckabee      CONWAY        AR    48
##  8 Huckabee   HYDE PARK        MA    45
##  9 Huckabee      DESTIN        FL    44
## 10 Huckabee      DALLAS        TX    42
## # A tibble: 10 x 4
##     names   contbr_city contbr_st     n
##    <fctr>        <fctr>    <fctr> <int>
##  1 Pataki      NEW YORK        NY    58
##  2 Pataki     LAS VEGAS        NV    13
##  3 Pataki         BRONX        NY    11
##  4 Pataki SAN FRANCISCO        CA     7
##  5 Pataki      SAN JUAN        PR     6
##  6 Pataki    ALEXANDRIA        VA     5
##  7 Pataki      BROOKLYN        NY     5
##  8 Pataki        CARMEL        NY     5
##  9 Pataki        NAPLES        FL     5
## 10 Pataki  NEW ROCHELLE        NY     5
## # A tibble: 10 x 4
##       names   contbr_city contbr_st     n
##      <fctr>        <fctr>    <fctr> <int>
##  1 O'Malley     BALTIMORE        MD   523
##  2 O'Malley    WASHINGTON        DC   234
##  3 O'Malley      NEW YORK        NY   181
##  4 O'Malley SILVER SPRING        MD   135
##  5 O'Malley      BETHESDA        MD   119
##  6 O'Malley       POTOMAC        MD    94
##  7 O'Malley     ANNAPOLIS        MD    77
##  8 O'Malley     ROCKVILLE        MD    76
##  9 O'Malley       CHICAGO        IL    70
## 10 O'Malley SAN FRANCISCO        CA    69
## # A tibble: 10 x 4
##     names        contbr_city contbr_st     n
##    <fctr>             <fctr>    <fctr> <int>
##  1 Graham         GREENVILLE        SC   296
##  2 Graham           COLUMBIA        SC   259
##  3 Graham           NEW YORK        NY   204
##  4 Graham         CHARLESTON        SC   125
##  5 Graham        SPARTANBURG        SC    83
##  6 Graham         ALEXANDRIA        VA    62
##  7 Graham             CHAPIN        SC    62
##  8 Graham       MYRTLE BEACH        SC    57
##  9 Graham HILTON HEAD ISLAND        SC    56
## 10 Graham        LOS ANGELES        CA    56
## # A tibble: 10 x 4
##     names  contbr_city contbr_st     n
##    <fctr>       <fctr>    <fctr> <int>
##  1   Bush     NEW YORK        NY  1397
##  2   Bush      HOUSTON        TX   907
##  3   Bush   WASHINGTON        DC   755
##  4   Bush        MIAMI        FL   715
##  5   Bush       DALLAS        TX   572
##  6   Bush  TALLAHASSEE        FL   408
##  7   Bush CORAL GABLES        FL   392
##  8   Bush   ALEXANDRIA        VA   342
##  9   Bush        TAMPA        FL   305
## 10   Bush      ATLANTA        GA   283
## # A tibble: 10 x 4
##     names contbr_city contbr_st     n
##    <fctr>      <fctr>    <fctr> <int>
##  1  Trump     HOUSTON        TX  7618
##  2  Trump      DALLAS        TX  4098
##  3  Trump   LAS VEGAS        NV  4061
##  4  Trump SAN ANTONIO        TX  3991
##  5  Trump    NEW YORK        NY  3476
##  6  Trump      NAPLES        FL  2909
##  7  Trump   SAN DIEGO        CA  2885
##  8  Trump      AUSTIN        TX  2817
##  9  Trump     PHOENIX        AZ  2768
## 10  Trump  SCOTTSDALE        AZ  2730
## # A tibble: 10 x 4
##     names  contbr_city contbr_st     n
##    <fctr>       <fctr>    <fctr> <int>
##  1 Jindal  BATON ROUGE        LA    81
##  2 Jindal  NEW ORLEANS        LA    49
##  3 Jindal     METAIRIE        LA    27
##  4 Jindal      HOUSTON        TX    20
##  5 Jindal    LAFAYETTE        LA    18
##  6 Jindal       KENNER        LA    16
##  7 Jindal LAKE CHARLES        LA    16
##  8 Jindal   MANDEVILLE        LA    15
##  9 Jindal   SHREVEPORT        LA    14
## 10 Jindal        HOUMA        LA    13
## # A tibble: 10 x 4
##       names   contbr_city contbr_st     n
##      <fctr>        <fctr>    <fctr> <int>
##  1 Christie      NEW YORK        NY   216
##  2 Christie       MENDHAM        NJ   122
##  3 Christie    MORRISTOWN        NJ   113
##  4 Christie        DALLAS        TX    91
##  5 Christie        SUMMIT        NJ    70
##  6 Christie BASKING RIDGE        NJ    66
##  7 Christie     WESTFIELD        NJ    58
##  8 Christie    LIVINGSTON        NJ    53
##  9 Christie        MCLEAN        VA    50
## 10 Christie     PRINCETON        NJ    47
## # A tibble: 10 x 4
##     names contbr_city contbr_st     n
##    <fctr>      <fctr>    <fctr> <int>
##  1 Walker      CARMEL        IN   117
##  2 Walker    NEW YORK        NY    98
##  3 Walker     HOUSTON        TX    93
##  4 Walker   MILWAUKEE        WI    89
##  5 Walker      DALLAS        TX    81
##  6 Walker  BROOKFIELD        WI    77
##  7 Walker     MADISON        WI    66
##  8 Walker     ATLANTA        GA    65
##  9 Walker     CHICAGO        IL    65
## 10 Walker      NAPLES        FL    63
## # A tibble: 10 x 4
##     names   contbr_city contbr_st     n
##    <fctr>        <fctr>    <fctr> <int>
##  1  Stein SAN FRANCISCO        CA   181
##  2  Stein      PORTLAND        OR   137
##  3  Stein      NEW YORK        NY   130
##  4  Stein       SEATTLE        WA   124
##  5  Stein       CHICAGO        IL   112
##  6  Stein      BROOKLYN        NY    95
##  7  Stein   LOS ANGELES        CA    92
##  8  Stein     SAN DIEGO        CA    60
##  9  Stein        EUGENE        OR    59
## 10  Stein         TULSA        OK    58
## # A tibble: 10 x 4
##     names   contbr_city contbr_st     n
##    <fctr>        <fctr>    <fctr> <int>
##  1   Webb    ALEXANDRIA        VA    36
##  2   Webb     ARLINGTON        VA    31
##  3   Webb  CHESTERFIELD        VA    22
##  4   Webb    WASHINGTON        DC    22
##  5   Webb      NEW YORK        NY    17
##  6   Webb       HOUSTON        TX    11
##  7   Webb      TILGHMAN        MD    10
##  8   Webb   LOS ANGELES        CA     9
##  9   Webb      RICHMOND        VA     9
## 10   Webb SAN FRANCISCO        CA     9
## # A tibble: 10 x 4
##     names contbr_city contbr_st     n
##    <fctr>      <fctr>    <fctr> <int>
##  1 Kasich    COLUMBUS        OH   646
##  2 Kasich    NEW YORK        NY   561
##  3 Kasich  CINCINNATI        OH   540
##  4 Kasich  WASHINGTON        DC   342
##  5 Kasich     HOUSTON        TX   263
##  6 Kasich     CHICAGO        IL   249
##  7 Kasich  ALEXANDRIA        VA   245
##  8 Kasich     ATLANTA        GA   226
##  9 Kasich   CLEVELAND        OH   203
## 10 Kasich      DUBLIN        OH   185
## # A tibble: 10 x 4
##      names    contbr_city contbr_st     n
##     <fctr>         <fctr>    <fctr> <int>
##  1 Gilmore       RICHMOND        VA    14
##  2 Gilmore        ROANOKE        VA    14
##  3 Gilmore     ALEXANDRIA        VA     4
##  4 Gilmore         MCLEAN        VA     4
##  5 Gilmore       NEW YORK        NY     3
##  6 Gilmore       VA BEACH        VA     3
##  7 Gilmore         AUSTIN        TX     2
##  8 Gilmore     FORT WORTH        TX     2
##  9 Gilmore FREDERICKSBURG        VA     2
## 10 Gilmore        HENRICO        VA     2
## # A tibble: 10 x 4
##     names   contbr_city contbr_st     n
##    <fctr>        <fctr>    <fctr> <int>
##  1 Lessig SAN FRANCISCO        CA    87
##  2 Lessig      NEW YORK        NY    57
##  3 Lessig     CAMBRIDGE        MA    45
##  4 Lessig   SAN ANSELMO        CA    45
##  5 Lessig       SEATTLE        WA    34
##  6 Lessig       CHICAGO        IL    27
##  7 Lessig      BROOKLYN        NY    24
##  8 Lessig      SAN JOSE        CA    20
##  9 Lessig    WASHINGTON        DC    19
## 10 Lessig   LOS ANGELES        CA    17
## # A tibble: 10 x 4
##      names contbr_city contbr_st     n
##     <fctr>      <fctr>    <fctr> <int>
##  1 Johnson    NEW YORK        NY   273
##  2 Johnson     HOUSTON        TX   176
##  3 Johnson ALBUQUERQUE        NM   167
##  4 Johnson   ROCHESTER        NY   136
##  5 Johnson      AUSTIN        TX   133
##  6 Johnson     CHICAGO        IL   121
##  7 Johnson      DALLAS        TX   118
##  8 Johnson   SAN DIEGO        CA   111
##  9 Johnson     SEATTLE        WA    85
## 10 Johnson LOS ANGELES        CA    83
## # A tibble: 10 x 4
##       names      contbr_city contbr_st     n
##      <fctr>           <fctr>    <fctr> <int>
##  1 McMullin            ALLEN        TX    30
##  2 McMullin   SALT LAKE CITY        UT    29
##  3 McMullin HUNTINGTON BEACH        CA    22
##  4 McMullin       GREENSBORO        NC    18
##  5 McMullin       ALEXANDRIA        VA    17
##  6 McMullin             OREM        UT    16
##  7 McMullin            PROVO        UT    16
##  8 McMullin      TALLAHASSEE        FL    16
##  9 McMullin       WASHINGTON        DC    16
## 10 McMullin        ARLINGTON        VA    14

I feel this is an extremely informative presentation of the data, as one can determine the top cities where each candidate received funds from. This information is highly valuable.Looking at just Clinton, she got the most from New York, D.C., and Los Angeles, Trump from HOuston, Dallas and Nevada (southern states) and Sanders from New York, Seattle, and Los Angeles.

for (i in namelist){a<-get(i)
                 a$location<-paste(as.character(a$city),as.character(a$contbr_city))
                 a$lat<- sapply(a$location,                          #Getting the longitude and latitude for each
                                        function(x)                  #candidate.               
                                          if (!is.na(x)){
                                            geocode(x,source ="dsk",messaging = FALSE)$lat}
                                        else {NA})
                 a$lon<- sapply(a$location,
                                       function(x) 
                                         if (!is.na(x)){
                                           geocode(x,source ="dsk",messaging = FALSE)$lon}
                                       else {NA})
                 assign(i,a) #Assigning the new information to the candidate dataframes.
                 remove(a)
}
for (i in namelist){
  map("world", fill=TRUE, col="white", bg="lightblue", ylim=c(-60, 90), mar=c(0,0,0,0))
  points(x=get(i)$lon,y=get(i)$lat,col="red")} #Plotting each candidate dataframe by city.

This visualization of contributions by city looks fine, but I believe this can be done in a somewhat more aesthetically pleasing way.

for (i in namelist){
  print(ggplot()+borders("world",colour="gray50",fill="gray50")+
          geom_point(data=get(i),mapping=aes(x=lon,y=lat),col='blue') #using borders and ggplot to plot the data.
        )
}

This looks better but at this point, the data can definitely be fine-tuned using other variables such as the sum of contributions.

for (i in namelist){
  a<-get(i)
  a$sum_of_cont<-(subset(plotCont,names==i)%>%
      group_by(names,contbr_city,contbr_st) %>% #Again grouping by city and state, but to get sum of contributions
      summarize(sum=sum(contb_receipt_amt),     #by city this time.
                n=n()) %>%
      ungroup() %>%
      ungroup())$sum
  assign(i,a)
  rm(a)
}
for (i in namelist){
  print(ggplot(aes(x=lon,y=lat),data=get(i))+borders("world",colour="gray50",fill="gray50")+
    geom_point(alpha=.75,aes(col=get(i)$n,size=get(i)$sum_of_cont/1000))+
      scale_color_gradient2(low = "red", mid = "white", high = "blue")) #Redoing the maps with size and color affected
}                                                                       #by sum and count.

These plots definitely look a lot better and contain a lot more information!

Final Visualizations

The summary for each candidate by mean, median, and sum of contributions, faceted by whether it was the primary or general elections was definitely a useful plot– reproduced below:

p1<-ggplot(aes(x=names,y=contb_receipt_amt,group=1),
       data=plotCont)+
  geom_bar(aes(fill='red'),stat="summary",fun.y=mean)+ #bar plot for mean of contbributions by candidate
  facet_wrap(~election_tp,nrow = 2)+                   #split into the primary and general elections.
  geom_point(alpha=.5,size=.75,stat="summary",fun.y=median)+
  geom_line(color='purple',stat="summary",fun.y=median,size=1.25)+ #line plot for median contributions.
  labs(x="Candidate Names",y="Mean Contribution Amount ($)")+
  theme(axis.text.x = element_text(size=10),
        axis.text.y = element_text(size=15),
        axis.title = element_text(size=15),
        strip.text = element_text(size=15))+
  guides(fill=FALSE)
p2<-ggplot(aes(x=names,y=contb_receipt_amt/1000000,group=1),
           data=plotCont)+
  ylab("Contribution Total ($ mil)")+          #line plot for sum of contributions by candidate split by primary and
  ggtitle("Contribution by Candidate")+        #general elections.
  geom_line(size=1.25,color='steelblue',stat="summary",fun.y=sum)+
  theme(plot.title = element_text(hjust=.5,size=22),axis.title.x=element_blank(),
        axis.text.x = element_text(size=10),
        axis.text.y = element_text(size=15),
        axis.title=element_text(size=15),
        strip.text = element_text(size=15))+
  facet_wrap(~election_tp,nrow=2)+
    guides(fill=FALSE)
grid.arrange(p2,p1)

The next plot I chose, as it is the next level of abstraction, shows the differences across parties.

ggplot(aes(x=party,y=contb_receipt_amt),data=plotCont)+
  geom_boxplot(aes(alpha=.1))+
  xlab("Party")+ylab("Contribution Amount")+ggtitle("Contributions by Party")+
  theme(plot.title = element_text(hjust=.5,size=22),
        axis.title = element_text(size=15),
        axis.text = element_text(size=13)
        )+
  coord_cartesian(ylim=c(quantile(Contributions[Contributions$party=="R",]$contb_receipt_amt,probs=.25),
                         quantile(Contributions[Contributions$party=="R",]$contb_receipt_amt,probs=.95)))+
  guides(fill=FALSE) #plotting boxplots for both contributions by gender and party with the 25th and 95th percentiles

                     #used to give a better picture of the plots.

Finally, the map visualization, for the main candidates is a worthwhile display of the data and the differences across these candidates:

#Redoing the maps with size and color affected by sum and count.
for (i in c("Clinton","Sanders","Trump")){
  print(ggplot(aes(x=lon,y=lat),data=get(i))+borders("world",colour="gray50",fill="gray50")+
    geom_point(aes(col=get(i)$n,size=get(i)$sum_of_cont/1000))+
      scale_color_gradient2(low="red",mid="orange",high="yellow")+
    ggtitle(paste(i," Contributions"))+theme(axis.title = element_text(size=15),
                     axis.text = element_text(size=15),
                     plot.title = element_text(hjust=.5,size=22),
                     legend.title = element_text(size=12)
    )+labs(color="Count",size="Sum of Contributions($Th.)")
    )
    }                                                                       

All three candidates seem to have similar locations contributing to them. Sanders garners donors from more locations than Clinton, and Trump garners even more locations than either!! But, both Sanders and Clinton seem to have areas with more-so higher contributions, though obviously less populated.

Results:

According to the visualizations, even though Sander’s campaign was deemed very popular, and was self-financing without the helps of PACs, other candidates still did much better than him in terms of donations! Furthermore, even lesser known, and barely in the race candidates, like Lessig, received high mean and median contributions– higer than Trump, Sanders or Clinton which suggests that these candidates (though probably did get just as high donations as the other candidates) received a majority of low amount donations. Finally, it seems the three main candidates got donations from across the globe, yet, again, it seems Trump, due to his world-wide fame garnered even more (count-wise), something not many people expected at all.

For improvements and further research, I would suggest attemping to do this for local elections, and then determining whether contributions actually affect winning rates (for Which the data here suggests they do not).